Get the data, and do the dimension reduction

Here, i get the data from the database, then do the dimension reduction by the method of factor analysis

Get data firstly.

Dimension reduction

In this part, i will do the dimension reduction by the factor analysis.

Choosing the Number of Factors

so, I choose 3 factors based on the principal of eigenvalues greater then 1.

Performing Factor Analysis

Get the new features' score

Understand the new factors

Commuting Pattern recognition

From then on, I start to get the commuting vehicles. The general process is clustering ths sample firstly, then train the decision tree model to get the commuting rule.

Get the optimal number of the clusters

Here i try to get the optimal cluster number by the gap statistic method.

DBSCAN

I have got the optimal clustering number that is 3. From then on, i will start to use several clustering method to identify the commuting vehicles. Firstly, i would like to try the DBSCAN method.

The initial parameters of DBSCAN determination

DBSCAN Clustering

ISODATA Clustering

Get the initial parameters of ISODATA by the kmeans

Understanding KMEANS clustering's statistical result

Start ISODATA Clustering based on the parameters that get from the kmeans

Understanding every clusters' statistical characteristic

Clustering by fast search and find of density peaks

Decision Graph Plot

Get the decision graph that is proposed by the paper "Clustering by fast search and find of density peaks". The graph can determine the optimal clustering number. Here i will not determine the optimal clustering number based on this method, but i also plot the decision graph

Start clustering by fast search and find of density peaks, and display the result

Get every clusters' statistical infomation

Get the clusters' Silhouette Coefficient

Get the commuting pattern rule based on decision tree model

train the decision model

I train the decision tree model based on the isodata result. And i get the best initial parameters of the decision tree model through several attempts, and finally i choose the initial parameters of " max_depth = 6 , min_samples_split = 50 , min_samples_leaf = 20".

特征重要性排序

不同数目特征下的决策树准确率变化情况

绘制出使用不同数目的特征的3个模型评价指标的变化情况

不同的规则下的准确率和召回率变化情况

通过上述的模型一共可以得到3条通勤规则,通过gini对3条通勤规则进行从小到大排序,然后分别分析仅1条规则时的准确率和召回率,2条规则时的准确率和召回率,依次类推,并绘制出折线图。在此的准确率等和之前决策树的准确率不同. image-2.png

得到决策树模型的更多信息